1 About

1.1 Contributions

Please note that authorship is alphabetical. Contributions are listed below - see github for details and who to blame for what :-).

Ben Anderson ( @dataknut)

1.3 Citation

If you wish to refer to any of the material from this report please cite as:

  • Anderson, B., (2019) Air Quality in New Zealand:Air Quality in New Zealand , University of Southampton: Southampton, UK.

Report circulation:

  • Public.

Report purpose:

  • to explore official New Zealand Air Quality data

This work has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 700386 (SPATIALEC).

This work is (c) 2019 the University of Southampton.

2 Introduction

3 PM 10 data

PM 10 data: has more sensors and wider coverage.

Data source: https://data.mfe.govt.nz/data/category/air/

Data file: mfe-pm10-concentrations-200617-CSV/pm10-concentrations-200617.csv

df <- paste0(dPath, pm10File)

pm10dt <- data.table::fread(df)
pm10dt[, `:=`(ba_date, lubridate::as_date(date))]
# the data is daily but there may be gaps?
pm10dt[, `:=`(council.site, paste0(council, ".", site))]

Overall there are:

  • 93 sites spread over
  • 15 councils
# looks like daily data with gaps
p <- makeTilePlot(pm10dt, yvar = "pm10", byvar = "council.site")
p + labs(y = "pm10") + guides(fill = guide_legend(title = "pm10"))
Test data values by date and site

Figure 3.1: Test data values by date and site

# looks like daily data with gaps
p <- makeLinePlot(pm10dt, yvar = "pm10", byvar = "council.site")
p <- p + labs(y = "pm10") + guides(colour = guide_legend(title = "pm10"))

plotly::ggplotly(p)

Figure 3.2: Test data values by date and site

4 PM 2.5 data

PM 2.5 data: has fewer sensors and less coverage.

Data source: https://data.mfe.govt.nz/data/category/air/

Data file: mfe-pm25-concentrations-200817-CSV/pm25-concentrations-200817.csv

df <- paste0(dPath, pm2.5File)

pm2.5dt <- data.table::fread(df)
pm2.5dt[, `:=`(ba_date, lubridate::as_date(date))]
# the data is daily but there may be gaps?
pm2.5dt[, `:=`(council.site, paste0(council, ".", site))]

Overall there are:

  • 35 sites spread over
  • 9 councils
# looks like daily data with gaps
p <- makeTilePlot(pm2.5dt, yvar = "pm2_5", byvar = "council.site")
p + labs(y = "pm2_5") + guides(fill = guide_legend(title = "pm2_5"))
Test data values by date and site

(#fig:pm2.5TestData)Test data values by date and site

# looks like daily data with gaps
p <- makeLinePlot(pm2.5dt, yvar = "pm2_5", byvar = "council.site")
p <- p + labs(y = "pm2_5") + guides(colour = guide_legend(title = "pm2_5"))

plotly::ggplotly(p)

(#fig:pm2.5TestDataPlotly)Test data values by date and site

5 Statistical Annex

5.1 PM10

skimr::skim(pm10dt)
## Skim summary statistics
##  n obs: 209964 
##  n variables: 10 
## 
## ── Variable type:character ───────────────────────────────────────────────────────────────────
##      variable missing complete      n min max empty n_unique
##       council       0   209964 209964   2   5     0       15
##  council.site       0   209964 209964   8  34     0       93
##          date       0   209964 209964  19  19     0     4382
##        method       0   209964 209964   3   4     0        2
##          site       0   209964 209964   4  28     0       93
## 
## ── Variable type:Date ────────────────────────────────────────────────────────────────────────
##  variable missing complete      n        min        max     median
##   ba_date       0   209964 209964 2006-01-02 2017-12-31 2011-12-27
##  n_unique
##      4382
## 
## ── Variable type:logical ─────────────────────────────────────────────────────────────────────
##            variable missing complete      n mean
##   complete_for_mean    1152   208812 209964 0.74
##  complete_for_trend    1152   208812 209964 0.64
##       complete_year   74218   135746 209964 1   
##                              count
##  TRU: 154552, FAL: 54260, NA: 1152
##  TRU: 134660, FAL: 74152, NA: 1152
##             TRU: 135746, NA: 74218
## 
## ── Variable type:numeric ─────────────────────────────────────────────────────────────────────
##  variable missing complete      n  mean    sd  p0 p25 p50   p75   p100
##      pm10       0   209964 209964 17.12 12.83 -10 9.9  14 20.22 433.75
##      hist
##  ▇▁▁▁▁▁▁▁
# looks like daily data with gaps
t <- pm10dt[, .(nObs = .N), keyby = .(council, site)]

kableExtra::kable(t, caption = "N obs at sites (PM10)") %>% kable_styling()
Table 5.1: N obs at sites (PM10)
council site nObs
AC BeachlandsMobile 340
AC Botany 1229
AC BotanyDowns 1813
AC GlenEden 3975
AC Helensville 342
AC Henderson 3951
AC KhyberPass 1172
AC KhyberPassRoada 1701
AC Kingsland 602
AC Kumeu 2188
AC MobilePOAL 1327
AC MountEdenIIb 18
AC Orewa 2439
AC Pakuranga 3979
AC Penrose 2114
AC PenroseIIb 1797
AC PenroseIVd 15
AC Pukekohe 802
AC Putamahoe 3944
AC Takapuna 3954
AC Waiheke 663
AC Waiuku 297
AC Warkworth 570
AC Whangaparaoa 2239
BOPRC Rotorua_at_Edmund_Road 3410
BOPRC Rotorua_at_Fenton_St 328
BOPRC Rotorua_at_Ngapuna 2279
BOPRC Rotorua_at_Pererika_Street 1947
BOPRC Tauranga_at_Morland_Fox_Park 2367
BOPRC Tauranga_at_Otumoetai 3291
BOPRC Whakatane_at_Kopeopeo 2309
ECAN Ashburton 4213
ECAN ChRiccRd 606
ECAN ChStA 4208
ECAN ChWoolston 4201
ECAN Geraldine 3761
ECAN Kaiapoi 4120
ECAN Rangiora 4138
ECAN Timaru 4201
ECAN Waimate 3738
ECAN Washdyke 3363
GDC GisborneBoysHigh 1671
GWRC LowerHutt 2180
GWRC MastertonEast 1596
GWRC MastertonWest 3425
GWRC UpperHutt 3966
GWRC Wainuiomata 3777
HBRC Awatoto 2043
HBRC Marewa_Park 4303
HRC Taihape 4039
HRC Taumarunui 2664
MDC Blenheim 3833
NCC Airshed_A 4344
NCC Airshed_B1 3974
NCC Airshed_B2 972
NCC Airshed_C 935
NCC Airshed_C2 134
NRC Kaitaia 349
NRC Kerikeri 323
NRC RobertSt 3571
NRC Ruakaka 1449
ORC Alexandra 3744
ORC Arrowtown 2772
ORC Balclutha 1459
ORC Clyde 1893
ORC Cromwell 1931
ORC Dunedin 3429
ORC Lawrence 536
ORC Milton 1765
ORC Mosgiel 2971
ORC Naseby 97
ORC Oamaru 474
ORC Palmerston 318
ORC Ranfurly 231
ORC Roxburgh 92
SRC Gore 3850
SRC Invercargill 2806
SRC Winton 1264
TDC Richmond 3920
WCRC Reefton 3721
WRC Cambridge 1129
WRC Hamilton_Claudelands 975
WRC Hamilton_Ohaupo_Rd 1723
WRC Hamilton_Peachgrove_Rd 2833
WRC Matamata 2626
WRC Morrinsville 526
WRC Putaruru 3731
WRC Taupo 3542
WRC Te_Awamutu 1199
WRC Te_Kuiti 3976
WRC Thames 263
WRC Tokoroa 3977
WRC Turangi 2692

5.2 PM2.5

skimr::skim(pm2.5dt)
## Skim summary statistics
##  n obs: 33750 
##  n variables: 10 
## 
## ── Variable type:character ───────────────────────────────────────────────────────────────────
##      variable missing complete     n min max empty n_unique
##       council       0    33750 33750   2   4     0        9
##  council.site       0    33750 33750   7  26     0       35
##          date       0    33750 33750  10  10     0     3617
##        method       0    33750 33750   3   3     0        1
##          site       0    33750 33750   4  22     0       35
## 
## ── Variable type:Date ────────────────────────────────────────────────────────────────────────
##  variable missing complete     n        min        max     median n_unique
##   ba_date       0    33750 33750 2008-01-01 2017-11-25 2014-03-05     3617
## 
## ── Variable type:logical ─────────────────────────────────────────────────────────────────────
##            variable missing complete     n mean
##   complete_for_mean       0    33750 33750 0.56
##  complete_for_trend       0    33750 33750 0.32
##       complete_year   14594    19156 33750 1   
##                          count
##  TRU: 18763, FAL: 14987, NA: 0
##  FAL: 22856, TRU: 10894, NA: 0
##          TRU: 19156, NA: 14594
## 
## ── Variable type:numeric ─────────────────────────────────────────────────────────────────────
##  variable missing complete     n mean   sd    p0 p25  p50  p75 p100
##     pm2_5       0    33750 33750 8.19 7.89 -0.43 4.1 5.79 8.69  180
##      hist
##  ▇▁▁▁▁▁▁▁
# looks like daily data with gaps
t <- pm2.5dt[, .(nObs = .N), keyby = .(council, site)]

kableExtra::kable(t, caption = "N obs at sites (PM2.5)") %>% kable_styling()
(#tab:pm2.5Sites)N obs at sites (PM2.5)
council site nObs
AC BeachlandsMobile 219
AC Helensville 341
AC MobilePOAL 1024
AC POAL 303
AC Patumahoe 3109
AC Penrose 2129
AC PenroseIIb 1090
AC Pukekohe 352
AC Takapuna 3244
AC Waiheke 670
AC Waiuku 296
AC Warkworth 316
AC Whangaparaoa 2247
ECAN Ashburton 703
ECAN ChRiccRd 186
ECAN ChStA 2412
ECAN ChWoolston 2116
ECAN Geraldine 561
ECAN Kaiapoi 361
ECAN Rangiora 742
ECAN Timaru 2003
ECAN Waimate 705
ECAN Washdyke 791
GWRC MastertonEast 1032
GWRC MastertonWest 2066
GWRC Wainuiomata 1684
HBRC Awatoto 409
HBRC St_John_s 486
MDC Blenheim 339
NCC Airshed_A 523
NRC RobertSt 155
TDC Richmond 370
WRC Hamilton_Claudelands 336
WRC Hamilton_Peachgrove_Rd 43
WRC Tokoroa 387

6 Runtime

Report generated using knitr in RStudio with R version 3.5.2 (2018-12-20) running on x86_64-apple-darwin15.6.0 (Darwin Kernel Version 17.7.0: Wed Apr 24 21:17:24 PDT 2019; root:xnu-4570.71.45~1/RELEASE_X86_64).

t <- proc.time() - startTime

elapsed <- t[[3]]

Analysis completed in 24.649 seconds ( 0.41 minutes).

R packages used:

  • data.table - (???)
  • ggplot2 - (???)
  • here - (???)
  • kableExtra - (???)
  • lubridate - (???)
  • plotly - (???)
  • skimr - (???)

7 References